Generic LSH Families for the Angular Distance Based on Johnson-Lindenstrauss Projections and Feature Hashing LSH
نویسندگان
چکیده
In this paper we propose the creation of generic LSH families for the angular distance based on Johnson-Lindenstrauss projections. We show that feature hashing is a valid J-L projection and propose two new LSH families based on feature hashing. These new LSH families are tested on both synthetic and real datasets with very good results and a considerable performance improvement over other LSH families. While the theoretical analysis is done for the angular distance, these families can also be used in practice for the euclidean distance with excellent results [2]. Our tests using real datasets show that the proposed LSH functions work well for the euclidean distance.
منابع مشابه
Fast Cross-Polytope Locality-Sensitive Hashing
We provide a variant of cross-polytope locality sensitive hashing with respect to angular distance which is provably optimal in asymptotic sensitivity and enjoys O(d ln d) hash computation time. Building on a recent result in [AIL15], we show that optimal asymptotic sensitivity for cross-polytope LSH is retained even when the dense Gaussian matrix is replaced by a fast Johnson-Lindenstrauss tra...
متن کاملS2JSD-LSH: A Locality-Sensitive Hashing Schema for Probability Distributions
To compare the similarity of probability distributions, the information-theoretically motivated metrics like KullbackLeibler divergence (KL) and Jensen-Shannon divergence (JSD) are often more reasonable compared with metrics for vectors like Euclidean and angular distance. However, existing locality-sensitive hashing (LSH) algorithms cannot support the information-theoretically motivated metric...
متن کاملK-medoids LSH: a new locality sensitive hashing in general metric space
The increasing availability of multimedia content poses a challenge for information retrieval researchers. Users want not only have access to multimedia documents, but also make sense of them the ability of finding specific content in extremely large collections of textual and non-textual documents is paramount. At such large scales, Multimedia Information Retrieval systems must rely on the abi...
متن کاملLocality sensitive hashing: A comparison of hash function types and querying mechanisms
It is well known that high-dimensional nearest-neighbor retrieval is very expensive. Dramatic performance gains are obtained using approximate search schemes, such as the popular Locality-Sensitive Hashing (LSH). Several extensions have been proposed to address the limitations of this algorithm, in particular, by choosing more appropriate hash functions to better partition the vector space. All...
متن کاملPractical and Optimal LSH for Angular Distance
We show the existence of a Locality-Sensitive Hashing (LSH) family for the angular distance that yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent. Unlike earlier algorithms with this property (e.g., Spherical LSH [1, 2]), our algorithm is also practical, improving upon the well-studied hyperplane LSH [3] in practice. We also introduce a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1704.04684 شماره
صفحات -
تاریخ انتشار 2017